Flexible Japanese Sentence Compression by Relaxing Unit Constraints
نویسندگان
چکیده
Sentence compression is important in a wide range of applications in natural language processing. Previous approaches of Japanese sentence compression can be divided into two groups. Word-based methods extract a subset of words from a sentence to shorten it, while bunsetsubased methods extract a subset of bunsetsu (where a bunsetsu is a text unit that consists of content words and following function words). Basically, bunsetsu-based methods perform better than word-based methods. However, bunsetsu-based methods have the disadvantage that they cannot drop unimportant words from each bunsetsu because they have to follow constraints under which each bunsetsu is treated as a unit. In this paper, we propose a novel compression method to overcome this disadvantage. Our method relaxes the constraints using Lagrangian relaxation and shortens each bunsetsu if it contains unimportant words. Experimental results show that our method effectively compresses a sentence while preserving its important information and grammaticality. TITLE AND ABSTRACT IN JAPANESE ユニット制約の緩和による柔軟な日本語文圧縮 文圧縮は,自然言語処理の様々なアプリケーションにおいて重要である.日本語文に対する 既存の圧縮手法は二種類に分けられる.単語ベースの手法は文から単語集合を選出し,圧縮 文とする.一方,文節ベースの手法は文から文節集合を選出し,圧縮文とする.基本的には 後者の方が良く機能する.しかし,文節ベースの手法は,文節をユニットとして扱うという 制約があるため,個々の文節を圧縮できない.本稿では,この欠点を克服する新しい圧縮手 法を提案する.提案手法はラグランジュ緩和を用いて上の制約を緩和し,各文節を圧縮する. 実験の結果,提案手法によって原文の情報を多く保持する文法的な圧縮文を生成できること が分かった.
منابع مشابه
Japanese Sentence Compression with a Large Training Dataset
In English, high-quality sentence compression models by deleting words have been trained on automatically created large training datasets. We work on Japanese sentence compression by a similar approach. To create a large Japanese training dataset, a method of creating English training dataset is modified based on the characteristics of the Japanese language. The created dataset is used to train...
متن کاملConstraints and Mechanisms in Long - Distance Dependency Formation
Title of Document: CONSTRAINTS AND MECHANISMS IN LONG-DISTANCE DEPENDENCY FORMATION. Masaya Yoshida, Ph.D. 2006 Directed By: Professor Colin Phillips, Department of Linguistics This thesis aims to reveal the mechanisms and constraints involving in long-distance dependency formation in the static knowledge of language and in real-time sentence processing. Special attention is paid to the grammar...
متن کاملSimultaneous English-Japanese Spoken Language Translation Based on Incremental Dependency Parsing and Transfer
This paper proposes a method for incrementally translating English spoken language into Japanese. To realize simultaneous translation between languages with different word order, such as English and Japanese, our method utilizes the feature that the word order of a target language is flexible. To resolve the problem of generating a grammatically incorrect sentence, our method uses dependency st...
متن کاملGlobal inference for sentence compression : an integer linear programming approach
In this thesis we develop models for sentence compression. This text rewriting task has recently attracted a lot of attention due to its relevance for applications (e.g., summarisation) and simple formulation by means of word deletion. Previous models for sentence compression have been inherently local and thus fail to capture the long range dependencies and complex interactions involved in tex...
متن کاملMulti-objective optimization of compression refrigeration cycle of Unit 132 South Pars refineries
The purpose of this paper is multi-objective optimization of refrigeration cycle by optimization of all components of the cycle contains heat exchangers, air condenser, evaporator and super-heater. Studied refrigeration cycle is compression refrigeration cycle of unit 132 Third refineries in south pars that provide chilled water for cooling refinery equipment's. Cycle will be performed by t...
متن کامل